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Climate models have become an important tool in the study of 
climate and climate change, and ensemble experiments consisting of 
multiple climate-model runs are used in studying and quantifying the 
uncertainty in climate-model output. However, there are often only a 
limited number of model runs available for a particular experiment, 
and one of the statistical challenges is to characterize the distribution 
of the model output. To that end, we have developed a multivariate 
hierarchical approach, at the heart of which is a new representation 
of a multivariate Markov random field. This approach allows for flex- 
ible modeling of the multivariate spatial dependencies, including the 
cross-dependencies between variables. We demonstrate this statistical 
model on an ensemble arising from a regional-climate-model experi- 
ment over the western United States, and we focus on the projected 
change in seasonal temperature and precipitation over the next 50 
years. 

1. Introduction. Many processes in the Earth system cannot be directly 
observed, and computer modeling has become a primary mode for studying 
these processes. These computer models often encapsulate entire fields of 
knowledge, providing a virtual laboratory for understanding physical rela- 
tionships and serving as a basis for making predictions. The Earth's climate, 
for example, is determined by the flows of energy, water, gases, etc., within 
and between the different components of the climate system, including at- 
mosphere, oceans, terrestrial and marine biospheres, sea ice, etc. Climate 
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models attempt to represent this system, as well as to incorporate anthro- 
pogenic forcings to assess the impact of human intervention. 

While climate models are deterministic, the output generated by these 
models is complex and subject to a number of sources of uncertainty. Initial 
climate states, assumptions about future forcings, and, of course, our under- 
standing (or lack thereof) of the physical processes and their representation 
in computer models, are all issues that may lead to uncertainty in the model 
output. To gain a better understanding of this model uncertainty, there is 
an increasing use of ensembles consisting of multiple model runs. These ex- 
periments may involve varying initial conditions (simple ensembles), model 
physics (perturbed-physics ensembles), specific models (multi-model ensem- 
bles), or some combination of all three, in an attempt to capture the range 
of variation in the model output. 

Climate is often considered the long-term average (or, more generally, the 
long-term distribution) of weather, and climate models typically simulate 
decades of weather to capture this long-term behavior. These simulations 
can take weeks or even months of computing time on high-performance su- 
percomputers. Furthermore, as interest continues to grow in climate models 
with higher spatial resolution, in particular, to study regional and local im- 
pacts of climate and climate change, the computational demands of climate 
modeling continue to grow. Even with the increased use of ensembles, the 
number of ensemble members is typically limited due to constraints on the 
models available, computation, funding, etc. Hence, statistical methods are 
necessary to quantify the distribution and breadth of variation of the model 
output in the ensemble. To this end, we introduce a hierarchical statistical 
model to capture the multivariate spatial distribution of the output fields 
(e.g., the joint spatial distribution of temperature and precipitation) from a 
regional-climate-model ensemble. With this statistical representation of the 
model output, we can present probabilistic projections of regional climate 
change based on the ensemble. Furthermore, by considering multiple output 
fields simultaneously, we can incorporate correlations between these fields, 
improving joint projections necessary for many climate-impacts studies (e.g., 
agriculture, water management, public health, etc.). 

1.1. Regional climate models. The climate of a region is determined 
by processes that exist at planetary, regional, and local spatial scales and 
across a wide range of temporal scales (multi-decadal to sub-daily). This 
creates serious difficulties when attempting to construct computer models 
that can simulate regional climate. Atmosphere-ocean general circulation 
models (AOGCMs or, more simply, GCMs) couple an atmospheric model 
with an ocean model and seek to simulate the Earth's global climate sys- 
tem. Due to model complexity, the need to simulate climate over decadal 
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and even centennial time scales, and computational limitations, these mod- 
els typically have grid boxes on the spatial scale of 200 to 500 km. While 
these models are extremely useful for investigating the large-scale circulation 
and forcings that affect the Earth's global climate, there are limitations to 
their use for regional and local projections that might be of interest to the 
climate-impacts community. 

Recognizing the need to include large-scale processes, even when studying 
regional climate, as well as the ability of GCMs to capture such phenomena, 
there is considerable attention on developing downscaling methodologies. 
Downscaling refers simply to generating information on the basis of a GCM, 
but at spatial scales below that of the GCM. There are two main types 
of downscaling, dynamical and statistical. Statistical downscaling is a com- 
putationally efficient approach that uses empirical relationships to connect 
the coarse-resolution GCM output to regional and local variables. This ap- 
proach needs fine-scale and long-time-duration observational data, and there 
is some uncertainty about the stability of these empirical relationships over 
long periods of time, especially with varying forcings (e.g., increasing CO2 
concentrations) . 

Dynamical downscaling involves using high-resolution climate models. Of 
course, there are limitations due to computational demands and a price to 
be paid for the higher resolution. One approach uses only the atmospheric 
component of a GCM with, for example, observed ocean temperatures. These 
so-called time-slice experiments are globally consistent, but they generally 
use many of the same formulations as the coarse-scale GCMs. 

Regional climate models (RCMs) are the focus of this research and are 
another dynamical approach based on high-resolution climate models. These 
models typically focus on a limited spatial domain, have grid boxes on the 
scale of 20 to 100 km, and there are often simplifications of ocean processes 
in these models. They also use initial conditions and time-dependent lateral 
boundary conditions from the GCM (e.g., winds, temperature, and mois- 
ture). Hence, global circulation and large-scale forcings are consistent with 
the GCM, but, with the higher-resolution forcings included (e.g., topogra- 
phy, land cover, etc.), these models have the potential to actually enhance 
the simulation of climate on regional and local scales. Of course, RCMs can 
be influenced by potential biases in the GCM, and there is a lack of two-way 
interactions between the driving GCM and the RCM. 

For those interested in understanding more about climate and climate 
change and global and regional climate modeling, we refer them to the Inter- 
governmental Panel on Climate Change (IPCC, http://www.ipcc.ch) as- 
sessment reports, and, in particular, to the contributions of Working Group 
I to the Third Assessment Report [Houghton et al. (2001)] and the Fourth 
Assessment Report [Solomon et al. (2007)]. These documents not only in- 
clude excellent overviews of the issues but also numerous scientific references 
for more in-depth coverage. 
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1.2. A statistical representation of climate-model output. We propose a 
statistical model for combining the output from simple ensembles of RCMs 
in order to characterize the distribution of the model output. This statisti- 
cal model will be formulated through what has now become the standard 
three-level hierarchical formulation, namely, data model, process model, and 
parameter model (prior distribution). The data model links the RCM out- 
put to an unobserved spatial process, where this process model is formulated 
to capture the spatial variation in the RCM output. Both the data model 
and the process model depend on unknown parameters to which a prior 
distribution is assigned. 

This basic hierarchical approach has been used in other settings for com- 
bining climate-model output. For example, Tebaldi et al. (2005) focuses on 
univariate summaries of temperature change from an ensemble of GCMs. 
Smith et al. (2009) also explore univariate summaries of temperature change 
from an ensemble of GCMs and link these summaries over different regions 
around the globe. Tebaldi and Sanso (2009) extend these approaches to 
bivariate models of temperatures and precipitation. Purrer et al. (2007a, 
2007b) study univariate spatial summaries and Berliner and Kim (2008) 
study bivariate time series, again constructed from an ensemble of GCMs. 
More recently, Cooley and Sain (2010) and Kang, Cressie and Sain (2010) 
have applied hierarchical models to ensembles of RCMs. However, to our 
knowledge, this paper is the first approach of this kind for a spatial analysis 
of multivariate output from RCMs. 

At the heart of this statistical model is an implementation of a multivari- 
ate Markov random field (MRF) for lattice data that offers great flexibility 
in modeling the spatial cross-dependencies between variables. We emphasize 
this capability in this setting, as the underlying physical behavior of the cli- 
mate system suggests the potential for significant spatial cross-dependencies, 
in particular, for key variables like temperature and precipitation. While 
these two output fields are the focus of this work, we note that the basic ap- 
proach based on the multivariate MRF presented here can be easily modified 
to consider other output fields as well. 

MRF models are also excellent tools for analyzing data laid out on reg- 
ular spatial lattices, such as those associated with images, remote-sensing, 
climate models, etc., or on irregular spatial lattices, such as U.S. census di- 
visions (counties, tracts, or block-groups) or other administrative units. In 
contrast to geostatistical methods that model spatial dependence through 
the specification of a covariance function (typically based on distances be- 
tween spatial locations), Gaussian MRF models represent the conditional 
expectation of an observation at a spatial location as a linear combina- 
tion of observations at neighboring locations. Spatial dependence is induced 
through this conditional autoregression and the choice of neighborhoods. 
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In addition, using a MRF formulation will allow us to incorporate compu- 
tational advantages due to the gridded nature of the climate-model output 
and the sparseness that is characteristic of the spatial-precision matrices (in- 
verse covariance matrices) that are specified in such models. Furthermore, 
the multivariate nature of the statistical model (more than one model out- 
put considered at each grid box) will allow for more complex inferences that 
are of use to those studying impacts of climate and climate change. 

1.3. Outline. In Section 2 an overview of MRF models is presented, fol- 
lowed by a description of the new formulation in Section 3. Section 4 con- 
tains the details of our hierarchical specification. An extensive study of an 
application using a simple ensemble of RCM output, focusing on changes 
in seasonal temperature and precipitation, will be presented in Section 5. 
Concluding remarks are given in Section 6. 

2. MRF and CAR models. Besag (1974) laid out the basic framework for 
MRF models. For random variables yi, . . . ,y„ observed at n locations on a 
spatial-lattice structure, the collection of conditional distributions f{yi\y-i), 
i = 1, . . . ,n (where y_j refers to all random variables except the ith. one), can 
be combined under certain regularity conditions to form a joint distribution 
/(yi, . . . , ?/„). Rue and Held (2005) can be consulted for an excellent expo- 
sition of the theory of MRFs; see also the reviews in the texts by Cressie 
(1993), Banerjee, Carlin and Gelfand (2004), and Schabenberger and Gotway 
(2005). Conditional autoregressive (CAR) models are special cases of MRF 
models where the conditional distributions are assumed to be Gaussian. 

2.1. Univariate CAR models. In the univariate setting and assuming 
Gaussian conditional distributions for f{yi\y-i), the conditional mean and 
conditional variance associated with f{yi\y-i) are specified as 



where 6jj = 0; i = l,...,n. Under regularity conditions, this collection of 
conditional distributions gives rise to a joint Gaussian distribution, 



where /i = (/xi, . . . , ^n)' ■, I is an n x n identity matrix, B is the n x n matrix 
with the (z, j)th element bij, and T = diag(r^, . . . ,t^). The regularity condi- 
tions are on the spatial-dependence parameters, {hj}, and they ensure that 
the resulting matrix, (I — B)~^T, is a bona fide covariance matrix; that is, 
(I — B)~^T is symmetric and positive-definite. The spatial dependence is in- 
duced by the autor egression, which is determined by setting the coefficients 
bij ^ if j G Ni (and otherwise), where Ni is a collection of indices that 
define a neighborhood of the ith location in the spatial lattice. 



n 




(1) 
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2.2. Multivariate CAR models. Mardia (1988) extended the MRF model 
of Besag (1974) to the multivariate setting where there is more than one 
measurement at each lattice point. In particular, let be a p-dimensional 
random vector. Then, for i = 1, . . . , n, let /(yi|y-i) be a Gaussian distribu- 
tion of Yi, given all random vectors except the ith, with 

E[yijy-i] = fJ^i + ^Bij(yj - tij) and Var[yi|y_i] = Tj, 

where fx^ is a p-dimensional vector, Bjj is a p x p matrix, and Tj is a 
p X p covariance matrix. Assume that BjjTj = TjB^^, for all i,j = 1, ... ,n 
(to ensure symmetry). Further, assume Bjj = —I and Bjj ^ for j G Ni and 
i = 1, . . . ,n. Under the assumption that the np x np matrix Block(— B^) (i.e., 
a block matrix with blocks given by — Bjj) is positive-definite, Mardia (1988) 
establishes that y = (y'^, . . . ,y^)' follows a Af{fi., S) distribution where = 
(/I'l, . . .,fi'J and E = (Block(-T-iBi,))-i. 

As written, this formulation is overparameterized and there have been 
a number of efforts in the literature that focus on ways of specifying the 
parameters in the basic model. See, for example, Billheimer et al. (1997), 
Kim, Sun and Tsutakawa (2001), Pettitt, Weir and Hart (2002), Carlin and 
Banerjee (2003), Gelfand and Vounatsou (2003), Jin, Carlin and Banerjee 
(2005), Jin, Banerjee and Carlin (2007), Daniels, Zhou and Zou (2006), Sain 
and Cressie (2007), among others. 

3. An alternative formulation of a multivariate MRF. It is our experi- 
ence that the basic multivariate MRF model of Mardia (1988) is difficult to 
implement in practice without dramatic simplification of the matrices rep- 
resenting the spatial dependence parameters [e.g., Billheimer et al. (1997)] 
or the use of restrictive priors on the elements of these same matrices [Sain 
and Cressie (2007)]. Here, we fully develop a new way of representing mul- 
tivariate lattice data that was suggested, but not implemented, by Sain and 
Cressie (2007), for the purposes of analyzing an RCM experiment. 

The multivariate extension of the framework laid out by Besag (1974) 
and explored by Mardia (1988) is based on the assumption of a multivariate 
observation at each point on a standard two-dimensional spatial lattice. 
Fundamental to our approach is thinking of multivariate lattice data as 
univariate data on a more complex lattice structure. 

In particular, this more complex lattice structure is conceptualized as 
a "stacking" of the lattices associated with each variable. Neighborhoods 
are defined by connections between locations for each variable within a lat- 
tice and again for locations across each lattice structure. A two-dimensional 
example of these different types of neighborhoods is shown in Figure 1. 
A within-variable dependence structure is induced by connecting locations 
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within a lattice associated with a particular variable (left frame). Cross- 
dependencies, both within a location (middle frame) and across locations 
(right frame), are induced through connections between the lattices for dif- 
ferent variables. 

The key feature of this approach is that it still falls within the original 
univariate framework of Besag (1974) outlined in Section 2.1. Let yij denote 
the jth variable observed at the ith location on the lattice. Then, for each 
Gaussian conditional distribution, the mean and variance need to be spec- 
ified. With sums on the right-hand side corresponding to specific types of 
neighborhoods in Figure 1, the conditional mean is given by 



(2) 



with the conditional variance given by Var[yjj =T^j, for all lattice 

points i = 1, . . . ,n, variables j = 1, . . . ,p, and with y~{ij] denoting all com- 
ponents of y except for the («, j)th. 

In the conditional mean, the coefficients in the first summation, {hijkj]i = 
l,...,n,k€Ni,j = l,...,p}, represent connections within a particular layer 
and control conditional dependence between the ith lattice point and neigh- 
boring points for the jth variable (left frame in Figure 1). The coefficients in 
the second summation, {bijn; i = 1, . . . ,n,j ^ i = 1, . . . ,p}, represent connec- 
tions across layers at the same lattice point and control conditional depen- 
dence between variables j and £ at the ith lattice point (middle frame in Fig- 
ure 1). Finally, the coefficients in the third summation, {bijki; i = 1, . . . , n. A; € 
-^ii J 7^ ^ = 1) ■ • ■ represent connections between locations across layers 
for different variables and control conditional cross-spatial dependence (right 



I 





Fig. 1. Examples of different types of neighborhoods. The left frame shows a within-vari- 
able spatial neighborhood, while the middle frame shows a within-location neighborhood. 
The right frame demonstrates the neighborhood associated with cross-variable connections. 
See also equation (2). 
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frame in Figure 1). Of course, all of these conditional-dependence parame- 
ters and the variances {T^j} must satisfy regularity conditions that yield a 
symmetric, positive-definite covariance matrix for the joint distribution. 

Some simplification of this basic structure is necessary to reduce the di- 
mensionality of the parameter space. Ordering the data as y = [y'^, . . . ,y^]', 
where y^ = [yn, . . . ,yip]' represents variables at the ith lattice point, we 
see from (1) that the joint distribution is Gaussian with mean given by 
fi = [fi[, . . . , fj,'^y and /ij = [iiii, . . . , Hip]', and with an np x np covariance 
matrix given by 



(3) 



Ai Bi2<5i2 

621^21 A2 



Bn_l n'^n— 1,' 



B^iJ, 



nl 



^n,n—l^n,n—l 



where 6ik = 1 if k & Ni and otherwise. Each p x p block is given by 



1 



^i£ij 



^ijii 



or Bi 



ik 



-hi 



Ikl 



Jilkj 



"^ijke 



bipkp ■ 



where —bijn and —bmj are arbitrary off-diagonal elements of Aj, and —h 

diag(r- 



and 



ijkt 
2 

11' 



—biikj are arbitrary off-diagonal elements of Bj^. Finally, T 
t2 ^ 

. . . , I ip, . . . , I . . . , I j^pj- 

In general, we shall assume that the neighborhood structure is symmetric; 
that is, if the kth lattice point is a neighbor of the ith lattice point, then the 
ith lattice point is a neighbor of the fcth {5ik = Ski)- We shall also assume that 
T^j = Tj for all j , implying a separate variance for each variable that does not 
vary with location. Hence, T = (X)diag(T), where r = [r^, . . . , r^]'. Further 
assumptions on the spatial-dependence parameters are made to reduce the 
dimensionality of the parameter space. 

To address symmetry and variance homogeneity across location, it suffices 
to examine the components of specific blocks in the inverse of the covariance 
matrix given by (3). First, the diagonal blocks are given by 

l/T-f -bijie/rf 



diag(T) ^Ai 



-biUj/Tj 



By symmetry, the corresponding off-diagonal elements should be equal; that 
is, bijii/rf = biiij/rf. Setting bijn = PjnTj/n, with pij = pj^, is one way of 
achieving the desired result. Then, diag(T)^^Ai = diag(T)~^/^Adiag(T)~^/^, 
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where 



-Pje 



-Pji 1 
For the off-diagonal blocks, symmetry demands that 

(4) diag(r)-^B,fc = [diag(r)-iBfci]'. 

Assummg i> k, the left-hand side of (4) is given by 



diag(r) ^Bj^ 



-bilkl/Tf -bijki/Tj 



-bi£kj/Ti 



Likewise, setting bijki = (pjiTj/n gives diag(T) ^Bik = diag(T) ^/^B 
diag(T)~^/^, where 



B 



^11 



and, for i> k, set diag(T) ^B^j = diag(T) ^/^B'diag(T) to satisfy (4). 
The covariance in (3) then simplifies to 



(5) 



-l/2l 







B6, 







-l/2l 



where r^/^ = [ti, . . . ,Tp]' and where {Sij} are indicator functions for neigh- 
borhood dependence. 

The specifications above simply ensure symmetry and reduce the number 
of parameters that must be estimated. Of course, the collection of spatial- 
dependence parameters, {pji} and {^j^}, must be chosen to ensure that (5) is 
a positive-definite covariance matrix. The final model has p{p — l)/2 within- 
location dependence parameters, {pje}, and between-location spatial- 
dependence parameters, {4>ji}, in addition to the p variance parameters, 
{t|}, and any parameters that are used to define the means. 

4. A hierarchical model for an RCM experiment. Let the n-dimensional 
vector Yrj denote the output of an RCM, in particular, the rth ensemble 
member for the jth variable. In this work we focus solely on simple ensem- 
bles; that is, each member of the ensemble represents a perturbation of initial 
conditions for a single model. Potential extensions of this basic framework 
for perturbed physics or multi-model ensembles is discussed in Section 6. 
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At the first level of the hierarchy, the data model assumes that the vectors 
Yrj-,f = ^1 ■ ■ ■ I'm,] = ^1 ■ ■ ■ iPi are independent with 

(6) 

where m indicates the number of ensemble members. In the mean structure, 
we allow for fixed effects common to all ensemble members within the jth 
variable (XiQj) and random effects specific to the rth ensemble member 
within the jth variable (X2/9^j). Spatial random effects are included through 
hrj , and cTj represents a variable-specific variance. 

The process model has two parts. First, the vectors [/3^;^, . . . , /3^p]', r = 
1, . . . , m, are assumed to be independent with 

where Sfe is a, pq x pq covariance matrix with q the number of columns 
of X2. Second, the vectors [h^;^, . . . , h^^]', r = 1, . . . , m, are assumed to be 



(7) 




independent with 



(8) 



h 



V 




{T/},w},{M~Ar : ,v({T/},{p,a,{<^ia) 



(Note that the vectors . . . , /fl^p]' ^"^^ [^rij • • • j h^p]' are assumed to be 
independent as well.) The first part, (7), focuses on linking the random 
regression coefficients specific to each ensemble member, while the second 
part, (8), imposes a multivariate structure on the spatial random effects. The 
covariance matrix V takes its form from the multivariate Markov random 
field in (5). 

The final level of the hierarchy assumes prior distributions on {ctj}, {ctj}, 
{hj}, Sft, and the parameters of the spatial covariance, namely, {'^'J}, 
{pjf}, and {<^ji\. Typically, these priors will be vague or noninformative as 
well as independent. In addition, the prior distribution on {pj^} and {^j^} 
must ensure that the resulting covariance matrix is positive-definite. 

Prom Bayes' theorem, the posterior distribution for the three-level hier- 
archical model is given by 

P({^,,.}, {h,,}, {a,}, {/3,}, S,, {h,}, {t|}, {</.,a|Y) 

ocP(Y|{a,},{/3,,},{h,,},{a,}) 
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X P({/3,^.}|{/3^.}, i:,)Pi{hrj}\{h,}, {r|}, {p,4, 

xP({a,})P({a,})P({/3,})P(5]b)P({h,})P({T|})P({p,,},{'A,4). 

It is clear that there is no closed-form solution for the posterior, and here 
Markov chain Monte Carlo (MCMC) [e.g., Gilks, Richardson and Spiegelhal- 
ter (1996)] is used to simulate realizations from the posterior distribution. 
In particular, we implement a Gibbs sampler [Geman and Geman (1984); 
Gelfand and Smith (1990); Gelfand et al. (1990)], incorporating Metropolis- 
Hastings steps [Metropolis et al. (1953); Hastings (1970)] where necessary. 

One benefit of a MRF is that the specification involves the precision or 
inverse covariance matrix, and this matrix is typically sparse; that is, many 
of the elements of the matrix are zero. Methods for storing and manipulating 
such matrices have been widely established [e.g., Davis (2006)], and there 
is great potential for computational efficiency associated with sparse-matrix 
methods. There are now several sparse-matrix packages in the R statistical 
computing environment [R Development Core Team (2007)]. However, the 
spam package [Furrer (2008)] has functionality that is well suited for imple- 
menting MCMC with a MRF model. For example, the sparse Cholesky de- 
composition is one of the most important computational devices used when 
implementing the Gibbs sampler for MRF models such as those developed 
in this work. A typical sparse Cholesky decomposition involves three steps: 

(1) reorganizing the matrix by permuting the rows/columns to achieve a 
pattern of sparsity that is more efficient for the sparse Cholesky algorithm; 

(2) a symbolic step that identifies the pattern of sparsity in the matrix; 
and (3) the numerical computation. The first two steps do not change when 
manipulating matrices repeatedly with the same patterns of sparsity (as is 
the case here). The spam package allows one to achieve even greater com- 
putational efficiency by not repeating these steps during the course of the 
MCMC. For more details on this and other computational benefits gained 
from incorporating sparse-matrix methods in such applications, see Furrer 
and Sain (2010). 

5. The RCM experiment. Leung et al. (2004) describe an RCM experi- 
ment using the NCAR/DOE Parallel Chmate Model to drive the NCAR/Penn 
State Mesoscale Model (MM5) as an RCM. The experiment produced a con- 
trol run from 1995-2015 and three future runs (ensemble members) from 
2040-2060. The domain consisted of the western United States and part of 
western Canada, and the model used a "business as usual" climate scenario 
incorporating a 1% annual increase in the amount of greenhouse gases. 

The n = 44 X 56 = 2464 grid boxes form a regular lattice. Since a long- 
run average of weather is one way to quantify climate, typical summaries of 
climate model runs include seasonal averages of temperature and precipita- 
tion with the length of the integration often determined by computational 
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Fig. 2. Top row shows the three differences in winter midpoint temperature (° K), while 
the bottom row shows the three differences in total precipitation (inches). 

considerations. Hence, twenty-year winter (December, January, and Febru- 
ary) average temperature and average total precipitation were computed for 
each grid box and for each of the control and the three future runs. Differ- 
ences between the future and the control were calculated, yielding change- 
in-temperature and change-in-total-precipitation variables. Hence, there are 
p = 2 variables and m = 3 ensemble members, giving six fields to be ana- 
lyzed. These spatial fields for the winter season are shown in Figure 2. A 
second, separate analysis with the same structure is also presented for the 
twenty-year summer (June, July, and August) change in temperature and 
change in total precipitation. 

We note that the statistical models outlined in this work focus on using 
Gaussian assumptions for average precipitation from an RCM. (An assump- 
tion that was verified through exploratory analysis.) However, other models 
for precipitation, in particular, for extreme precipitation, are possible; see, 
for example, Sanso and Guenni (2004), Schliep et al. (2010), and Cooley and 
Sain (2010). 

5.1. Model specification. We now outline some specifics about the statistical- 
model specification. Consider the data model (6). After some exploratory 
analysis, (scaled) latitude, longitude, and elevation were used as covariates 
in the common regression component (Xicij), to which a random intercept 
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across ensemble members was added (X.2Prj)- The prior covariance matrix 
for the random intercept in (7) was also simplified to Xl;, = cr^Ip. 

The prior distribution for the variance parameters, {c'j} and cr^, were 
taken to be noninformative; that is, they were assumed to follow the prior 
distribution -P(cr^) oc independently. The prior distributions for the 

regression parameters, {cij} and {Pj}, were taken to be mean- zero Gaus- 
sian distributions with covariance matrices proportional to the identity and 
with large variances (i.e., = 10 and = 100). The prior distributions 
for {hj} were also taken to be mean-zero Gaussian distributions with co- 
variance matrices proportional to the identity and with large variances (i.e., 

<. = io). 

Finally, the prior specification for the joint distribution of p, (pn, 022) 
(j)i2, and (j)2i was taken to be uniform over the range of values that yield a 
positive-definite covariance matrix. This region was identified using rejection 
sampling based on a sparse Cholesky decomposition. A simple simulation 
study of a univariate Markov random field's spatial dependence parameter 
(not reported here) suggested that concentrating priors on a subregion of 
the parameter space leads to a biased estimate when the true parameter lies 
outside this region. While this may not seem surprising, the lesson learned 
is that there has to be a good reason to choose nonuniform priors for these 
bounded spatial-dependence parameters. 

5.2. Results for the winter season. Posterior distributions were obtained 
using MCMC algorithms, and considerable care was taken to ensure the con- 
vergence of the parameters in the MCMC. This is especially true with respect 
to the conditional-dependence parameters, where our experience has shown 
that straightforward approaches can lead to disappointing performance (i.e., 
very slow mixing and convergence). Ten chains were run, each with random 
starting values; the starting values for the conditional-dependence parame- 
ters were chosen uniformly across the space of values that yield a positive- 
definite covariance matrix. 

A Gibbs sampler was implemented that involved three distinct regimes. 
In the first regime (2500 iterations), each of the conditional-dependence 
parameters was updated one at a time using a Metropolis-Hastings algo- 
rithm. Gaussian proposal distributions were used, with periodic updates of 
the proposal variance to achieve an approximate 20% acceptance rate. In 
the second regime (the next 10,000 iterations), p, (^12, and (^21 were up- 
dated simultaneously using a Metropolis-Hastings algorithm with a multi- 
variate Gaussian proposal distribution. Again, the proposal covariance ma- 
trix was updated periodically to achieve an approximate 20% acceptance 
rate. Other conditional-dependence parameters were still updated using a 
univariate Metropolis-Hastings algorithm. Finally, in the third regime (the 
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last 10,000 iterations), p, (pi2, and (p2i were again updated simultaneously, 
but no further updates of the proposal distribution were made. Convergence 
of the posterior distributions of the parameters in the MCMC was monitored 
using both graphical and numerical methods [e.g., Gelman (1996)]. Posterior 
distributions were then estimated by sampling from the third regime. 

Of particular interest are the conditional-dependence parameters, since 
these control the nature and degree of the spatial correlation in the model. 
Figure 3 shows scatterplots and kernel estimates of the distribution of 0ii 
(temperature) and 022 (precipitation), the parameters that control the con- 
ditional dependence between lattice points within a layer (Figure 1, left 
panel). The distributions show that there is considerable (conditional) spa- 
tial dependence within each variable, as the distributions tend to be concen- 
trated near the positive boundary of possible values for (pu and (/)22- There 
is evidence of a slightly stronger dependence for temperature {(pn). 

Trace plots and other diagnostics for p, <^i2, and 4>2i suggest convergence 
after about 10,000 iterations, which corresponds to the end of the second 
sampling regime. These three parameters control the dependence structure 
across variables; p summarizes the within- location dependence (Figure 1, 
middle panel) and (pi2, 4>2i summarize the cross- variable dependence (Fig- 
ure 1, right panel). The estimated posterior mean and posterior standard 
deviation for p is —0.12 and 0.014, respectively. A negative value for p sug- 
gests that an increasing temperature is (conditionally) associated with a 
decreasing total precipitation. 

Figure 4 highlights the distribution of (pi2 and 021 • The strong correlation 
between these two conditional cross-correlation parameters is clearly shown 




Fig. 3. Left frame shows scatterplot of a random sample of 10,000 values of (pn and 
4>22 (1000 from each of the 10 chains). Contours represent approximate 25, 50, and 75% 
contours of a kernel density estimate. Right frame shows kernel density estimates of the 
marginals for (ftn (blue) and (f>22 (red). 
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Fig. 4. Left frame shows scatterplot of a random sample of 10,000 values of 4>i2 and 
4>2i (1000 from each of the 10 chains). Contours represent approximate 25, 50, and 75% 
contours of a kernel density estimate. Right frame shows kernel density estimates of the 
marginals for cj>\2 (red) and 021 (blue). 

in the left frame of Figure 4. However, there is another feature of note: There 
is compehing evidence of asymmetry in the strength of these two parame- 
ters, with roughly 85% of the sampled points being above the line y = x. 
This suggests that there is higher conditional dependence between temper- 
ature values and neighboring total precipitation than there is conditional 
dependence between total precipitation values and neighboring temperature 
values. Almost all of the published models of multivariate MRFs assume 
012 = 4'2i , something we have argued previously as being overly restrictive 
[Sain and Cressie (2007)]. Our posterior inference shows the inappropriate- 
ness of such an assumption in this case. 

Figure 5 shows posterior means for the fixed regression components (left 
column), the spatial random effects (middle column), and their sum (right 
column), for the change in winter average temperature (top row) and the 
change in total winter precipitation (bottom row). The fixed effects show a 
clear latitudinal effect as well as an east-to-west gradient. For precipitation, 
there is a more dominant east-to-west gradient. The spatial random effects 
for the change in temperature seem to follow the features of the topography, 
and are, in general, of smaller magnitude than the fixed effects. The spatial 
effects for the change in total precipitation also follow the features of the 
topography, but there are additional strong local features, for example, in 
northern California. In contrast to temperature, the spatial effects for total 
precipitation are larger relative to the fixed effects. 

The sum of the fixed effects and the spatial random effects for temperature 
shows a consistent pattern of winter warming on average throughout the 
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Fig. 5. Posterior means for the regression (left), the spatial effect (middle), and the sum 
(right) for the winter season. The top row represents the change m midpoint temperature 
(° K), while the bottom represents the change in total precipitation (inches). 



west, while the sum for total precipitation shows patterns that are much 
more localized. The most dominant signal for total precipitation is indicated 
by the regions of sharp decline in winter precipitation in northern California 
and the Pacific northwest. 

To aid in the identification of areas that might be at most risk for change, 
as projected by this regional-climate-model experiment, Figure 6 shows the 
result of a hierarchical clustering based on the posterior distribution of the 
mean change in temperature and total precipitation for each grid. One thou- 
sand samples were drawn from the posterior distribution of the mean change 
in temperature and total precipitation for each grid box. The distance metric 
used to join clusters was a symmetrized version of the Kullback-Leibler dis- 
tance, which was based on the assumption of bivariate normality within each 
cluster. Hence, the clustering focuses not only on the mean of the posterior 
distribution but also includes information about the covariance structure of 
the changes for each grid box. The scatterplot in the left frame of Figure 
6 shows the posterior means for each grid box with each cluster indicated 
by different colors. The scatterplot shows the considerable structure that 
the clustering is able to distinguish. The spatial pattern of the clusters is 
also shown in Figure 6 (right frame). The dark red areas, for example, high- 
light a region associated with a strong increase in temperature and a strong 
decrease in total precipitation. 

It is also useful to consider other measures of uncertainty. Fields of stan- 
dard deviations are one approach, but when considering the multivariate 
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Fig. 6. Results from clustering the posterior means of the change m temperature and the 
change in total precipitation for the winter season. The left frame is a scatteplot with the 
clusters indicated through different colors. The right frame shows the clusters spatially. 

nature of this model output other possibihties arise. For example, Figure 7 
shows estimated pointwise probabilities of an increase in temperature (left 
frame), a decrease in total precipitation (middle frame), and a simultane- 
ous increase in temperature and decrease in total precipitation (right frame) 
based on a sampling of the posterior distribution of the joint spatial fields. 
In general, temperature is increasing on average across the entire domain; 
hence, these probabilities are based on increases larger than the median com- 
puted from all the samples across all the grid boxes. Likewise, the decrease 
in total precipitation was based on decreases larger than the median across 
all the samples from all the grid boxes. Again, on the basis of this model, 
we see evidence of widespread increase in temperature and decrease in total 
precipitation across the western U.S., with dominant features along the far 
western coast, the Pacific northwest, and isolated mountain regions. 




Fig. 7. Estimated pointwise probabilities for the winter season: increasing temperature 
(left), decreasing total precipitation (middle), and simultaneously increasing temperature 
and decreasing total precipitation (right). 
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Figure 8 shows an alternative representation of the joint distribution by 
considering conditional probabilities. The figure shows the probability of the 
decrease in total precipitation being in the top quartile, conditional on the 
increase in temperature being in the first quartile (top left), second quartile 
(top right), third quartile (bottom left) and fourth quartile (bottom right). 
As the temperature increase becomes more extreme, the largest decreases in 
total precipitation move from being focused in the southwest (and the Cal- 
ifornia coast) to the Pacific northwest (and the California coast). Relative 
changes can also be considered and, although not shown here, the normaliza- 
tion minimizes the impact on the Pacific northwest (higher absolute changes 
in total precipitation, but also higher total precipitation values in general). 

Finally, Figure 9 shows approximate 95% contours for the joint change 
in (average winter) temperature and total precipitation, but focuses on the 
grid boxes that represent the five consolidated metropolitan statistical areas 
(as defined by the U.S. Census) that are included in the domain. Figure 9 
suggests that the projection for the Denver area includes average increases in 




Fig. 8. Probability of a large decrease in winter total precipitation, conditional on the 
increase in temperature falling in the first quartile (top left), second quartile (top right), 
third quartile (bottom left), and fourth quartile (bottom right). 
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temperature of just under 1°K with minimal average decreases in total pre- 
cipitation. The contour for the Sacramento area, on the other hand, suggests 
much larger average increases in temperature and average decreases in total 
precipitation. We believe that such plots, summarizing the joint distribution 
based on the statistical model, will have great interest and application to 
scientists and decision-makers interested in the impacts of climate change. 

5.3. Results for the summer season. A slightly different scheme was used 
for the Gibbs sampler for the analysis of the summer model output. The 
three-regime sampling was still used, although with twice as many iterations 
(20,000) in the second regime. In addition, all five conditional-dependence 
parameters (p, (pu, 4>22, 4>i2, and 4>2i) were updated simultaneously. Con- 
vergence of parameters for the summer season was similar to that for the 
winter season, although somewhat slower, and the specifics of those results 
are not discussed here. Distributions of the posteriors for the parameters 
were similar, with the exception of the cross-dependece parameters, and, in 
particular, p (posterior mean of —0.41 for summer versus a posterior mean 
of —0.12 for winter), suggesting that the summer season has a much stronger 
and more negative correlation between the change in temperature and the 
change in total precipitation. 

Figure 10 shows posterior means for the summer season with the same 
layout as in Figure 5 for the winter season. Now there appears to be a west- 
to-east gradient in the fixed effects for temperature, and, again, the spatial 
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Fig. 10. Posterior means for the regression (left), the spatial effect (middle), and the sum 
(right) for the summer season. The top row represents the change m midpoint temperature 
(° K), while the bottom represents the change in total precipitation (inches). 

random effects pick up more of tlie topography that is not accounted for in 
the fixed effects. Again, for temperature, the spatial random effects are of 
smaller magnitude than the fixed effects. 

For total precipitation, there is also a west-to-east gradient in the fixed 
effects. The spatial random effects for the change in total precipitation also 
follow the features of the topography, but there are strong local features, now 
occurring in the eastern part of the domain. In comparison to temperature, 
the spatial random effects are larger relative to the fixed effects. 

The sum of the fixed effects and the spatial random effects for tempera- 
ture shows a consistent pattern of summer warming on average throughout 
the west, while the sum for total precipitation shows patterns that are much 
more localized, just as in the winter season. However, the most dominant 
features for total precipitation are the regions of decrease in summer total 
precipitation in the eastern part of the domain. Again, a hierarchical cluster- 
ing was performed on samples from the posterior distribution for each grid 
box, which is summarized in Figure 11. There appears to be more widespread 
warming and decreasing total precipitation during the summer months, but 
the clustering again highlights structure in the joint distribution. 

As in Figure 7, Figure 12 shows estimated pointwise probabilities of an 
increase in temperature (left frame), a decrease in total precipitation (middle 
frame), and a simultaneous increase in temperature and decrease in total 
precipitation (right frame) based on a sampling of the posterior distribution 
of the joint spatial fields. In general, summer warming and decreasing total 
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Fig. 11. Results from clustering the posterior means of the change in temperature and 
total precipitation for the summer season. The left frame is a scatterplot with the clusters 
indicated through different colors. The right frame shows the clusters spatially. 

precipitation is widespread, even more so than in the winter season, and 
focused more on the eastern side of the domain. 

Figm'e 13 is constructed similarly to Figure 8. However, the stronger neg- 
ative correlation in the summer season is more apparent in the figure. The 
large decreases in total precipitation are strongest in the eastern portion of 
the domain, but decrease dramatically when we condition on larger increases 
in temperature. 

Finally, Figure 14 shows approximate 95% contours for the joint change 
in (average summer) temperature and total precipitation. In this case, the 
roles are reversed from the winter season. The contours suggest larger in- 
creases in temperature and decreases in total precipitation (on average) for 
the Denver area, while the contour for the Sacramento area suggests more 
modest changes on both variables. 




Fig. 12. Estimated pointwise probabilities for the summer season: increasing temperature 
(left), decreasing total precipitation (middle), and simultaneously increasing temperature 
and decreasing total precipitation (right). 
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Fig. 13. Probability of a large decrease in summer total precipitation, conditional on the 
increase in temperature falling in the first quartile (top left), second quartile (top right), 
third quartile (bottom left), and fourth quartile (bottom right). 

6. Concluding remarks. Climate models have become an important tool 
in the study of climate and climate change. Ensemble experiments of climate- 
model output, be they comprised of perturbed initial conditions, perturbed 
physics, or multiple models, have also become important in studying and 
quantifying the uncertainty in climate- model output. However, there are 
typically only a limited number of runs that can be produced due to the 
time and expense of running these models, even on modern supercomputers. 
Hence, statistical methods become necessary to quantify the distribution and 
the breadth of variation in the model output. 

With this idea in mind, we have introduced a hierarchical spatial sta- 
tistical model designed primarily for the analysis of regional-climate-model 
output on the basis of a simple ensemble (perturbed initial conditions). This 
model is multivariate and has the capacity to simultaneously characterize 
multiple model outputs, for example, the average change in temperature and 
the average change in total precipitation. While analysis of the individual 
model outputs might yield estimates of marginal distributions, the strong 
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Fig. 14. Approximate 95% contours for the average change in summer temperature and 
precipitation for the grid boxes associated with the five consolidated metropolitan statistical 
areas m the domain. 

correlations across variables, such as those uncovered here in the analysis 
of the average changes in summer temperature and the average changes in 
total precipitation, make a multivariate analysis crucial for joint inference. 

The statistical model also captures the spatial variation in the model out- 
put through a novel implementation of a multivariate MRF. In addition to 
the computational benefits arising from using models based on an MRF, this 
formulation of a multivariate MRF has a great deal of flexibility in modeling 
the conditional-dependence structure and is easily extendable. For example, 
more complex neighborhood structures can be considered [e.g., Sain, Furrer 
and Cressie (2007)], and it is not difficult to conceptualize how one might 
even consider modeling the joint distribution of multiple variables that are on 
different lattices. Connections to graphical models [e.g., Whittaker (1990)] 
could lead to further insights into modeling and parameter estimation. The 
computational impact and practical utility of considering additional vari- 
ables (increasing p) or additional ensemble members (increasing m) is also 
of interest, although, at least in climate model research, these are highly de- 
pendent on the application and the computational demands associated with 
running climate models on high-performance computers. These are issues 
that we are currently considering. We also note that this work adds to a 
growing collection of research involving the study and modeling of asym- 
metric cross-dependence structures for multivariate spatial data, including, 
for example, Jin, Carlin and Banerjee (2005) and Sain and Cressie (2007) 
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in the case of MRFs and Royle and Berliner (1999), Ver Hoef, Cressie and 
Barry (2004), and Apanasovich and Genton (2010) for geostatistical data. 

There is great interest in more complex ensembles, such as perturbed- 
physics experiments and multi-model ensembles. We have not considered 
such ensembles here, as we have focused on the multivariate aspect of the 
analysis of simple ensembles of regional-climate- model output. However, 
the model presented here could be extended to consider such ensembles by 
straightforward modifications to the process model, in particular, equation 
(8) could be modified to allow for a spatial meta-analysis component [e.g., 
Kang, Cressie and Sain (2010)] or through a functional analysis of variance 
similar to that of Kaufman and Sain (2010). Aside from the obvious compu- 
tational challenges to simply fitting such models in the multivariate setting, 
there is, of course, more work needed to quantify the variation associated 
with different model physics or different models. 

It is important to note that any conclusions taken from an analysis such 
as the one considered here are conditional on the assumptions implicit to 
the particular climate model or models used to generate the output fields. 
Whether global or regional in nature, climate models are typically con- 
structed to reproduce certain features in the current climate, and analyzing 
differences as we did should minimize the impact of any biases in the climate 
models (although this approach might be viewed with some healthy skepti- 
cism, as it is not clear that the biases in current runs are going to be the same 
as biases in future runs). Observations may be included to help constrain 
the statistical model, at least with respect to current climate. However, it 
is still an open question how to include observational data sets for spatial 
analyses of RCMs of the sort done here. Station-level data does not have 
the same spatial and temporal coverage, and there are also numerous ad- 
ditional issues with using interpolated data products, including reanalysis 
data that represent a data assimilation using both station-level data and 
climate-model output. 

In addition to the longitude, latitude, and elevation used in this anal- 
ysis, predictors based on climatology (long-run means of temperature and 
precipitation) were considered [e.g., Furrer et al. (2007a, 2007b)], but these 
were ultimately ruled out as not being effective at predicting the changes in 
temperature and precipitation. However, work done by Tebaldi et al. (2005) 
offers an approach for combining model output and observations. We are 
currently considering how their approach may be useful for spatial analyses 
of RCM output. 

Finally, there is also much interest, for example, from people examining 
the impacts of climate change, in combining model output in order to obtain 
improved projections of climate change or to span the variation across a 
climate-model experiment. Of course, with the more complex climate-model 
experiments, there is the issue of model-to-model correlations. We believe 
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that the inherent multivariate nature of this model provides an excellent 
starting place to consider such correlations. 
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